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Abstract 

In this paper we develop a new form of agent-based model for limit order books based on heterogeneous 
trading agents, whose motivations are liquidity driven. These agents are abstractions of real market par¬ 
ticipants, expressed in a stochastic model framework. We develop an efficient way to perform statistical 
calibration of the model parameters on Level 2 limit order book data from Chi-X, based on a combination of 
indirect inference and multi-objective optimisation. We then demonstrate how such an agent-based modelling 
framework can be of use in testing exchange regulations, as well as informing brokerage decisions and other 
trading based scenarios. 


1 Introduction 


In this paper, we develop a model that simulates trading activity in the Limit Order Book (LOB), the most 
common form of market mechanism, utilised in major stock exchanges to match the buying and selling interest 


in stocks Jain 2003 . The LOB is a complicated, multivariate, event-driven stochastic process, resulting from the 


combination of buy and sell orders being grouped into a multi-level queueing framework, and Gould et al. 2013 


provides a characterisation of some of the main attributes. As an indication of the complexity of this process, one 
need only examine the attributes of the orders entering this set of LOB queues: Each order can be distinguished 
by order type, price, and size (in number of shares, number of contracts etc.). With regard to order type, there 
are limit orders, which enter at particular levels of the buy side (the bid) or the sell side (the ask) until executed 
or cancelled, or market orders, which are executed at the current best price. Time ordering is also important in 
establishing priority in a queue, and orders are typically given timestamps of millisecond or finer resolution by 
the trading venue. 

Our aim is to capture pervasive features of the LOB, which have been suggested to originate from the change 
in market structure over the last two decades. For example, the prominence of high frequency trading, which 
constitutes the majority of trading data today (approximately 73% according to Hendershott et al. 2011| ) has 
been suggested to be responsible for the a rapid decline in the number of orders that remain in the LOB until 


1 















execution. Instead, orders are very frequently cancelled and resubmitted at different prices, either to gain priority, 
or to reduce the risk of adverse selection (getting ‘picked off’ by a large trader). This induces dependencies between 
different event types (limit order submissions and cancellations, for example), which is very likely non-linear, and 
may be affected by prevailing market conditions. 

The dynamics that may arise from a LOB stochastic process are challenging to model, and doing so in a 
parsimonious manner is a particularly formidable task. Ro§u 2009 discusses the complexity of modelling the 


dynamics that emerge from the interaction of large numbers of anonymous traders, while Large 2007] suggests 
that even when studying order replenishment alone, there are multiple dimensions to consider. Besides the 
trading interest itself, there are also numerous features that could also be incorporated into a model for intra-day 
trading on a financial exchange, which include particulars regarding the exchange mechanism that matches the 
trading interest in a particular asset, as well as exchange-specific rules governing the operation of the market 
under certain conditions. 

There have been two approaches that have prevailed in the LOB modelling literature. Firstly, agent-based 
frameworks, which typically involve a large number of economic agents interacting under a restricted set of agent 


attributes. Cristelli et al. 2011 organises several such models according to their ability to interpret real market 
participant behaviours, as well as tractability, and finds that these two axes are very much at odds. As an 
example, the simplicity of the agent behaviours considered by Farmer et al. 2005 , Maslov 2000 , makes their 


interpretation in terms of real market participant activity difficult. On the other hand, there have been efforts 
to introduce influences from real market behaviours, e.g. by Arthur et al. 1996 , Chiarella and lori] 2002 , but 


several of these models have methodological problems related to empirical validation, discussed in Windrum et al.| 
2007 , or the calibration is not based on well understood simulation-based estimation frameworks. 


The second approach to LOB modelling considers pure stochastic model frameworks, see for instance |Chris-| 
This approach abstracts away the market participant from the modelling process. Instead, a 


tensen et al. 2013 


stochastic modelling approach is taken, where the complex trading dynamics are distilled into a set of statistical 
assumptions. These models can capture key empirical properties of the processes comprising the LOB stochastic 

They also give rise to LOB simulation frameworks 


Cont et al. 2010 Huang and Kercheval 2012 


structure 

which feature these same properties, see for instance Christensen et al. 2013 , Daniels et al. 2003 


In this paper, we propose a third type of hybrid approach based on a selection of attributes from each of these 
methods. In particular, we develop a new form of agent-based model for limit order books based on liquidity 
motivated agents, in which the LOB price and volume dynamics are emergent features of the interaction between 
abstractions of real-world market participants. We develop two types of such agents in our framework, namely 
liquidity providers (market makers), and liquidity demanders, with the latter forming a stylised representation of 
algorithmic traders, noise traders, trend followers and other types of speculators. Their activity is expressed in 
a stochastic model framework, which is more detailed than typical simple agent models. This places our model 
part way between a traditional agent-based model and a pure limit order book stochastic model. 

The model is structured to allow for efficient calibration under a rigorous statistical estimation framework. 
We introduce a new simulation-based estimation approach based on a combination of Indirect Inference and 
multi-objective optimisation. We calibrate our representative agent stochastic model to real high frequency data 
from Level 2 limit order book data from Chi-X. We show how such a procedure can be used to estimate the model, 
such that the resulting simulations approximate real data in more than one aspect, in our case the behaviour of 
the intra-day price and volume processes. 

A practical benefit of the agent-based modelling approach we develop is that one can utilise it to estimate the 
effect of a regulatory intervention. In modern LOBs there have been proposals to introduce regulation in order 
to curb high frequency trading, in cases where it is seen to be harmful to market quality. Under the stochastic 
agent-based modelling framework we are able to evaluate the effect of a ‘quote-to-trade ratio’ imposition, which 
has been discussed in this context. The empirical predictions of the model suggest that the imposition of such a 
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ratio is, ceteris paribus, sufficient to limit extreme intra-day volatility in the price process. 

Our work contributes to the field of LOB modelling in a number of ways: Firstly, our model has structural 
components which are directly interpretable and easily understood in terms of market participants’ behaviours. 
Compared to traditional agent-based models, which have considered the segmentation of the agent population 
into an element concerned with price fundamentals and another concerned with recent price fluctuation^ our 
demarcation according to liquidity motivations is more reflective of current market behaviours. Secondly, it is 
able to capture key attributes of the observed LOB process, such as dynamics of asset price evolution, liquidity 
dynamics and volume process attributes. In addition, the model is able to capture the dependence in the 
intensity of limit order, market order and cancellation activity at different levels of the LOB, which has not been 
considered in previous models. Finally, as a contribution to the calibration of simulation models in general, the 
paper contributes a new statistical estimation framework for simulation models that is both rigorous and efficient. 

The rest of the paper is organised as follows. Section provides an overview of both the agent-based and 
stochastic LOB modelling literature, both of which this paper draws from. Section presents a formal mathe¬ 
matical specification of each component of our stochastic representative agent-based model. Section [^introduces 
the estimation procedure employed in this paper, along with the features of the real data that we are interested 
in calibrating the model against. Section presents the results from estimating various versions of the model 
of increasing complexity. Section [^ presents a case study of the introduction of a quote-to-trade ratio in the 
simulated market. Section [3 concludes. 


2 Related literature 

2.1 Background on LOB simulation dynamics: Agent-based models 

In the agent-based modelling literature for financial market simulations it is common practice to divide the 
trading population into fundamentalist and chartist traders. Early studies, such as that by [Taylor and Allen 
1992 , undertook surveys on a number of London-based dealers to characterise the trading behaviours prevalent 


at the time. Such surveys refer to fundamentalist traders as deriving their views from an economic analysis of the 
traded asset. In the context of an agent-based model, fundamentalists traders distil their economic analysis into a 
single figure, the fundamental price of the asset, and trade accordingly. Being a chartist dealer, on the other hand, 
involved ‘providing forecasts or trading advice on the basis of largely visual inspection of past prices, without 
regard to any underlying economic or fundamental analysis’. Common chartist behaviours in an agent-based 
model include making decisions based on the price of the asset, compared to its moving average in a particular 
period, or assuming that a short move in a certain direction will continue in the near future (a momentum 
strategy). 


The chartist & fundamentalist literature in agent-based modelling began with the works of Frankel and Froot 


[1988], and 

Kirman 

1993 and was 

then developed further by, for example. 

Farmer and Joshi 

2002 , Westerhoff 

and Reitz 

2003 

Youssefmir et al. 

1998 and Vigfusson 1997 , amongst others. As a first step in capturing 


heterogeneity in trading behaviours, this distinction in trading behaviours is important, and showed that there 
was a useful middle ground to explore between zero-intelligence-agent type approaches and perfect rationality 
models. However, markets have evolved, and the behaviours of participants have changed accordingly. 

A more relevant division of trading behaviours in modern markets is between the buy and sell side, with 
the latter providing liquidity to the former. More recent models of agent-based LOB activity have considered 

[20071 for 


liquidity provision as a way to distinguish between the different types of agents (see, e.g. Preis et al. 


an example and LeBaron 2006 for a review of related work). On the one hand, we have liquidity providers, who 


^The chartist and fundamentalist approach to agent-based modelling is covered in detail in Section 2.1 
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may have quoting obligations (i.e. they are designated market maker^, or not (which is usually the case with 
high frequency traders). On the other, we have liquidity demanders (or liquidity traders), whose need to trade 
is unrelated to the model, or to the price. Examples of these include fund managers in passive index funds. 

Our model also assumes the existence of these two types of agents, but it differs from most ABMs in that we do 
not model individual agents explicitly. We cannot claim to have precise knowledge of the strategies employed by 
any type of trader, and in any case, implementing even a small subset of such strategies would be a very difficult 
undertaking, due to the recent nature and complexity of a variety of high frequency trading firms strategies. We 
would expect, however, that the aggregation of the order flow from a class of agents would be more amenable to 
modelling. This provides the motivation for considering this agent activity in a stochastic modelling framework. 


2.2 Background on LOB simulation dynamics: Stochastic (non agent-based) 

The class of stochastic models is motivated more from a statistical perspective, where several components of 
market structure, as well as the details of market participant strategies, are abstracted away by a set of stylising 
statistical assumptions. The objective is usually to model a particular feature of the LOB process, such as the 
price or volume process, through a stochastic model. Particularly in response to empirical studies describing the 
change of market structure over tim^ such models have been used to help understand stock price dynamics at 
much shorter time intervals Cartea and Jaimungal 2013| . 

In terms of the approaches used in this context, several authors have considered the LOB as a set of queues 
of orders at each price, and for either side (bid and ask), and as a consequence, employ queue-type stochastic 
structures to perform LOB simulations. Examples of these include the queuing system proposed by |Cont et ^ 
2010 and, in a simpler specification, by Cont and De Larrard 2013 . Under this model, the LOB is treated as 


a continuous time Markov chain, where all event types (limit orders at every level, cancellations, market orders) 
are mutually independent. They show that the assumption of a power law characterising the limit order intensity 
functions, as one moves from the best bid or ask, is a good match with empirical observations. They also obtain 
conditional probabilities of various LOB events that may be of interest in algorithmic trading. 

consider 


In an important extension of the Markov queueing system as an LOB simulator, Huang et al. 2013 


the trading activity in unevenly spaced intervals in which the reference price (the mid price in this case) is 
constant. They also introduce some trivial dependence between activity at different levels of the LOB, in order 
to explain the consumption of liquidity beyond the first LOB level, when there is no resting volume at the first 
level. Simulations of the model using purely the event processes cannot closely reflect macro-level features of 
real markets, and some assumptions about the distribution of the resting volume beyond the first few levels is 
required for this. 

The arrival process of limit orders, as well as market orders and cancellations is one of the most commonly 
modelled LOB aspects. Eor example, in their effort to explain the concavity of the price impact function observed 


across stocks. Smith et al. 2003 considered limit order arrival rates on the bid and ask side as independent Poisson 


processes, and orders priced relative to the extant bid and offer prices. Their simplifications pertained to pricing 
on an infinite grid, constant size orders, and a constant cancellation rate. Later models, however, Bowsher 2007 


Large 2007 observed clustering in trading activity, which is a feature of the LOB that cannot be captured by 


modelling order arrivals as independent Poisson processes. Instead, Bowsher 2007 and Large 2007 proposed the 


use of univariate and multi-variate Hawkes processes, to explain the clustering of trades and limit order arrivals 


after a trade (i.e. order replenishment), respectively. Recently Huang et al. 2013 also suggested a simple Markov 
queueing system to capture the dependence in the consumption of liquidity in the first two levels of the LOB. 


^ https: //www.nyse. com/mar ket- model/dmiii-case-studies 

^Hasbrouck and Haar| [2013| and [Hendershott_et_^ 20^ provide evidence of the increasing representation of high frequency 


trading firms in the market, although Brogaard et al.l 2014 does not find that this increases institutional execution costs. 
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Other stochastic models proposed for LOB simulations include that by Ro§u 2009 who introduced an LOB 


model which was intended to provide an alternative explanation for the submission of orders at different levels 
of the LOB, compared to the adverse selection risk favoured by the market microstructure theory. Instead, 
traders are assumed to have a higher expected utility from trading at a more favourable price, but lose utility 
proportionally to their waiting time when trading via limit orders. The model predicts that a competitive bid-ask 
spread can result from competition between liquidity providers, and that the possibility of large market orders 
can lead to a hump-shaped LOB. 

While the overall motivation for the agent behaviours we consider in our model comes from their liquidity 
impulses, the stochastic models we consider for their trading activity are related to the family of models described 
in this section. The assumption of independent, homogeneous Poisson processes for limit order arrivals is fairly 
simplistic, and we therefore incorporate dependence in the limit order arrival intensities at different levels of the 
LOB, as part of a flexible parametric model which we describe in the following section. 


3 New perspective: Stochastic agent-based models for the LOB 

In this section we present the formal mathematical specification for each component of our stochastic agent- 
based model. This includes the stochastic models for limit order placements and cancellations by a liquidity 
provider agent and the stochastic models for market order placements by liquidity demanding representative 
agents. The stochastic ABM framework can model the non-linear dependence in intra-day LOB activity, where 
the dependence is considered both between different types of events (e.g. limit and market orders), but also the 
same type of events, but at different levels (e.g. cancellations at level 2 and level 5 of the ask side of the LOB). 
We make extensive use of the flexible multivariate skew-t distribution, which is unique in enabling the modelling 


of heavy tails, tail dependence, skew and clustering of volatility Demarta and McNeil 2005 Fung and Seneta 


2010 


3.1 Limit Order Book simulation framework 

We consider the intra-day LOB activity in fixed intervals of time ..., [t — l,t), [t,t -|- 1),.... For every interval 
[t,t+ 1), we allow the total number of levels on the bid or ask sides of the LOB to be dynamically adjusted as the 
simulation evolves. These LOB levels are defined with respect to two reference prices, equal to and 
i.e. the price of the highest bid and lowest ask price at the start of the interval. We consider these reference 
prices to be constant throughout the interval and thus, the levels on the bid side of the book are defined 

at integer number of ticks away from , while the levels on the ask side of the book are defined at at integer 
number of ticks away from ■ 

This does not mean that we expect the best bid and ask prices to remain constant, just that we model 
the activity (i.e. limit order arrivals, cancellations and executions) according to the distance in ticks from 
these reference prices during this period. We note that it is of course possible that the volume at the best bid 
price is consumed during the interval, and that limit orders to sell are posted at this price, which would be 
considered at 0 ticks away from the reference price. To allow for this possibility, we actively model the activity at 
—Id -I- 1,..., 0,..., Ip ticks away from each reference price. Here, the p subscript will refer to passive orders, i.e. 
orders which would not lead to immediate execution, if the reference prices remained constant, d refers to direct, 
or aggressive orders, where it is again understood that they are aggressive are with respect to the reference prices 
at the start of the period. Therefore, we actively model the activity at a total k = Ip + Id levels on the bid and 
ask, as indicated in Figure 
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We assume that activity that occurs further away is uncorrelated with the activity close to the top of the 
book (as is evident in Figure [^, and therefore unlikely to have much of an impact on price evolution and 
the properties of the volume process. Therefore, the volume resting outside the actively modelled LOB levels 
{—Id + 1,..., 0,..., /p) on the bid and ask is assumed to remain unchanged until the agent interactions brings 
those levels inside the band of actively modelled levels. 

To present the details of the simulation framework, including the stochastic model components for each agent, 

i.e. liquidity providers and liquidity demanders, we first define the following notation: 

1. ^ - the random vector for the number of orders resting at each level on the ask 

side at time t at the actively modelled levels of the LOB at time t 

2. ^ ^ ^ ^ random vector for the number of limit orders entering the limit 

order book on the ask side at each level in the interval [t — l,t) 

3. = {Nf ’°'’^,..., - the random vector for the number of limit orders cancelled on the ask side 

in the interval [t — 1, t) 

4. - the random variable for the number of market orders submitted by liquidity demanders in the 
interval [t — l,t) 

We consider the processes for limit orders and market orders, as well as cancellations to be linked to the 
behaviour of real market participants in the LOB. In the following, we model the aggregation of the activity 
of 2 classes of liquidity motivated agents, namely liquidity providers and liquidity demanders. As we model 
LOB activity in discrete time intervals, we process the aggregate activity at the end of each time interval in the 
following order: 

1. Limit order arrivals - passive - by the liquidity provider agent 

2. Limit order arrivals - aggressive or direct - by the liquidity provider agent 

3. Cancellations by the liquidity provider agent 

4. Market orders by the liquidity demander agent. 

The rationale for this ordering is that the vast majority of limit order submissions and cancellations is typically 
accounted for by the activity of high-frequency traders, and many resting orders are cancelled before slower traders 
can execute against them. In addition, such an ordering allows us to condition on the state of the LOB, so that 
we do not have more cancellations at a particular level than the orders resting at that level. We do not see this 
as a limitation, as the time interval we consider can be made as small as desired for a given simulation. 


3.2 Stochastic agent representation: liquidity providers and demanders 


We assume liquidity providers are responsible for all market-making behaviour (i.e. limit order submissions and 
cancellations on both the bid and ask side of the LOB). After liquidity is posted to the LOB, liquidity seeking 
market participants, such as mutual funds using some execution algorithm, can take advantage of the resting 
volume with market orders. For market makers, achieving a balance between volume executed on the bid and 
the ask side can be profitable; however, there is also the risk of adverse selection, i.e. trading against a trader 
with superior information, which may lead to losses if, e.g. a trader posts multiple market orders that consume 
the volume on several levels of the LOB. The risk of adverse selection as a result of asymmetric information is 
one of the basic tenets of market microstructure theory O’hara 1995 . To reduce this risk, market makers cancel 


and resubmit orders at different prices and/or different sizes. 
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Definition 1 (Limit order submission process for the liquidity provider agent). Consider the limit 
order submission process of the liquidity provider agent to include both passive and aggressive limit orders on the 
bid and ask sides of the book, assumed to have the following stochastic model structure: 

1. Let the multivariate path-space random matrix S be constructed from random vectors for 

the numbers of limit order placements ,..., . Furthermore, assume these 

random vectors for the number of orders at each level at time t are each conditionally dependent on a 
latent stochastic process for the intensity at which the limit orders arrive, given by the random matrix 

G and on the path-space by ,..., . In the following, k G {a, b} 

indicates the respective process on the ask and bid side. 

2. Assume the conditional independence property for the random vectors 




LO,k I A LO,fc1 


X 


N, 


LO^k I A LO,k 


Vs^t, s,t G {1,2,...,T}. 


( 1 ) 


3. For each time interval [t — l,t) from the start of trading on the day, let the random vector for the 
number of new limit orders placed in each actively modelled level of the limit order book, i.e. the price 
points corresponding to ticks {—Id + 1, • ■ •, 0,1,..., Zp), as depicted in be denoted by = 

(jY^LO,fc,-i<j+i^ ^ ^ ^ ^ and assume that these random vectors satisfy the conditional independence 

property 


jn^LO,k,s |y^_LO,fc,s 


X 


j^LO ,k,q ^ ^LO ,k ,q 




( 2 ) 


4- Assume the random vector g is distributed according to a multivariate generalized Cox process 

with conditional distribution ^ GCV given by 


Pr 


j^LO,k,-U + l ^ 




LO,k,lp 


nu 


LO,k _ ,.LO 


’'=) = ni=- 


-/d+i 




exp 


-A 


LOjk^s 


(3) 


5. Assume the independence property for random vectors of latent intensities unconditionally according to 

^LO,k^^LO,k^ s,tG{l,2,...,r}. ( 4 ) 


6 . 


Assume that the intensity random vector ^LO,k ^ 

is obtained through an element-wise transformation 
of the random vector g ^ uthere for each element we have the mapping 


A LO,k,s LO.k.s T7 ( T^LO.fc.s 

t — Fo V * 


(5) 


where we have s G {—Id + Ij • ■ • > Ip}, baseline intensity parameters G M+ and a strictly monotonic 

mapping F : K i-G [0,1]. 

7. Assume the random vector G K distributed according to a multivariate skew-t distribution ^ 

MSt{mA',v^with location parameter vector G M.^‘, skewness parameter vector {3^ G M.^‘, degrees 
of freedom parameter G N+ and It x It covariance matrix Hence, has density function 
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( 6 ) 


fj.Lo,k = 


(-v/(y'“+Q(')'t,’^'‘))[/3'“]^[S'"-] ^/3'=) exp (■ 7 f-TTO'')'^[E''] ^( 3 ’‘‘ 


i>+lt 






where Ky(z) is a modified Bessel funetion of the second kind given by 


Kviz) = ^ 


"'>di 


(7) 


and c is a normalisation constant. We also define the function Q{-, ■) as follows: 


Qi-yt,m^) = ht-m’^f [E’^] ht - m’^) 


( 8 ) 


This model also admits skew-t marginals and a skew-t copula, see Smith et al. \201^ for details. Importantly, 
this stochastic model admits the following scale mixture representation. 


= m'= + I3^W + y/WZ 


(9) 


( k k\ 

\-i\] and independent Gaussian random vector Z ^ 

iV(0,E'=). 

8. Assume that for every element of order counts from the random vector , there is a cor- 

,i G 


responding random vector g gf order sizes. We assume that the element 

|l,... ^ is distributed as ^ H{-). Furthermore, we assume that order sizes are uncondi¬ 
tionally independent QLO,k,s ^ j^/^ ^ ^ ^ ^ p^ 

We now define the second component of the liquidity provider agents, namely the cancellation process. The 
cancellation process has the same stochastic process model specification as the limit order submission process 
above, including a skew-t dependence structure between the stochastic intensities at each LOB level on the 
bid and ask. We therefore only specify the differences unique to the cancellation process relative to the order 
placement model definition in the below specification, to avoid repitition. 

Definition 2 (Limit order cancellation process for liquidity provider agent). Consider the limit order 
cancellation process of the liquidity provider agent to have an identically specified stochastic model structure as 
the limit order submissions. The exception to this pertains to the assumption that the number of cancelled orders 
in each interval at each level is right-truncated at the total number of orders at that level. 

1. As for submissions, we assume for cancellations a multivariate path-space random matrix Ni.f G 

constructed from random vectors for the number of cancelled orders given by N ^’^,..., . 

Furthermore, assume for these random vectors for the number of cancelled orders at each of the It levels, 
the latent stochastic process for the intensity is given by the random matrix G and given on the 


path-space by = |^Af’ 


k j,C,k 
■''-2 ) 













2. Assume that for the random vector for the volume resting in the LOB after the placement of limit orders 
we have , and that the random vector is distributed according to a truncated 


multivariate generalized Cox process with conditional distribution 
(with v= ...,vi)) given by 


C,k 


Vt^ = v' 


QCV I(7Vf 


Pr ( N'i 




/d+1) 


,iV, 


C,k,lp 


= 






n — 

s— 2-^i — 




< V 


( 10 ) 


J=o 


3. Assume that for the cancellation count , the orders with highest priority are cancelled from level s 

(which are also the oldest orders in their respective queue). Assume also that cancellations always remove 
an order in full, i.e. there are no partial cancellations. 

We complete the specification of the representative agents by considering the specification of the liquidity 
demander agent. 

Definition 3 (Market order submission process for liquidity demander agent). Consider a represen¬ 
tative agent for the liquidity providers to be composed of a market order component, which has the following 
stochastic structure: 

1. Assume a path-space random vector for the number of market orders constructed from the 

random variables for the number of market orders in each interval ,..., 

Furthermore, assume that for these random variables the latent stochastic process for the intensity is given 


by random variable G and given on the path-space by A^^’^ = A, 

2. Assume the conditional independence property for the random variables 


MO,k 
2 ) 




MO,fc 




N, 


MO,k, KMO.k 


Vs^t, s, t G {1, 2,..., T} . 


( 11 ) 


3. Assume that for the random variable for the volume resting on the opposite side of the LOB after the 


placement of limit orders and cancellations we have 


and vice-versa, and that the random variable G N+ is distributed according to a truncated generalized 


yk' ,s 
''t-At 


rC.k'.s 


, where k' = a if k = b. 


Cox process with conditional distribution N^ 

i 


MO,k 


\Rk = r - gcr (Af < r) given by 


Pr( N. 


MO,k 


= n 


RMO,k ^^MO,k Bk ^r) = 


iff 








( 12 ) 


■'3=0 o'. 

4- Assume the independence property for random vectors of latent intensities unconditionally according to 

RfO.k^pMO.k^ s,t G {l,2,...,r}. (13) 

5. Assume that for each intensity random variable ig ^ corresponding transformed intensity 

and the relationship for each element is given by the mapping 


■ hi T~\MC),k _ 

variable i ^ £ . 


A, 


MO,k 


= Lo 


MO,k 




( 14 ) 


for some baseline intensity parameter G M+ and strictly monotonic mapping F : M i—>■ [0,1]. 
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6. Assume that the random variables characterizing the intensity before transformation of the 

Generalized Cox-Process, are distributed in interval \t — l,t) according to a univariate skew-t distribution 

7. Assume that for every element of market order counts, there is a corresponding random vector 

QMO,k,s g sizes. We assume that the element ,i € |l,..., is distributed 

according to ^ H{-). Assume also that market order sizes are unconditionally independent i 

for i^i' ortf^t'. 

We denote the LOB state for the real dataset at time t on a given day by the random vector Lt, and this 
corresponds to the prices and volumes at each level of the bid and ask. Utilising the stochastic agent-based model 
specification described above, and given a parameter vector 0 , which will generically represent all parameters of 
the liquidity providing and liquidity demanding agent types, one can then also generate simulations of intra-day 
LOB activity and arrive at the synthetic state L’l {6). The state of the simulated LOB at time t is obtained from 
the state at time t — 1 and a set of stochastic components, denoted generically by Xt, which are obtained from 
a single stochastic realisation of the following components of the agent-based models: 


Limit order submission intensities order numbers , and order sizes 


TLO,b,s 


where s =-Id + I ■ ■ .lp,i = 1. ■. = 

Limit order cancellation intensities Af’^, Ap’“ and numbers of cancellations , N^’‘^ 

Market order intensities numbers of market orders ^Y^MO,b^Y^MO,a 

market order sizes i = 1... = 1... 


These stochastic features are combined with the previous state of the LOB, Ll_^ (0), to produce the new state 
(0) for a given set of parameters 0, given by 


l; (0) = g(t*_i(0),xo 


(15) 


G(-) is a transformation that maps the previous state of the LOB and the activity generated in the current 
step into a new step, much the same way as the matching engine updates the LOB after every event. As we 
model the activity in discrete intervals, however, the LOB is only updated at the end of every interval, and the 
incoming events (limit orders, market orders and cancellations) are processed in the order specified in Section 
|3.1| Conditional then on a realization of these parameters 0, the trading activity in the LOB can be simulated 
according to the procedure described in Algorithm 


4 Simulation based likelihood calibration 

A common attribute of all agent-based modelling frameworks is that they are able to generate realisations of 
the stochastic process they represent, in our case the LOB process. That is, given a set of specifications for the 
parameters of the agents, the simulation of the agent model is trivial and efficient. However, it is also commonly 
the case that there is either no direct tractable (to evaluate pointwise) likelihood model or the likelihood model 
is complex and computationally costly to evaluate. In these cases, traditional parameter estimation methods 
based on likelihood inference are not directly applicable, when calibrating such models to observed LOB data. 
There are, however, a range of methods, which have yet to be utilised widely in the agent-based modelling 
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Algorithm 1 Stochastic agent-based LOB simulation 
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procedure SIMULATe(0, T) 
for t = 1.. .T do 

\> Simulate Liquidity Provider Limit Orders Bid/Ask. 
for k = a,b do 

[> Simulate dependent stochastic intensities for limit order submissions. 
Sample ^ MSt{m^,{3^,Y,^) via Equation]^ 

Apply transformation in Equation]^ 

[> Simulate dependent limit order counts at each level bid/ask. 


LO,k 


gcv (; 


Sample 

[> Simulate limit order sizes. 
for s = —Id + 1,... lp,i = 1... 


LO,k 


via Equation 


LO,k,s 


do 


^LO,k,s 


H{-) 


[> Simulate Liquidity Provider Cancelled Limit Orders Bid/Ask. 
for k = a,b do 

[> Evaluate total volumes at each level bid and ask. 

'{'rLOjk xrLO,k . -xpLOyk ~LO,k 

Vt =v;_i’ +Nt ' =Vt 

> Simulate dependent stochastic intensity for bid and ask cancellation counts. 
Sample rf’'^ = ■y^’^ - MSt{m^''^, via Equation^ 

Apply transformation Af’^ = in Equation]^ 

[> Simulate dependent limit order cancellation counts at each level of the bid/ask. 
Sample ~ QCV via Equation 

> Simulate Liquidity Demander Market Orders. 
for k = a,b do 

[> Evaluate the current resting volumes on each level of the bid/ask. 
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R 


LO,k 


= 




LO,k\s 


-N, 


C,k',s 


~LO,k 


[> Simulate stochastic intensities for market order submissions. 

Sample 7 ^'='’'= ~ from skew-t distribution. 

Evaluate transformation F{y^‘^’^) in Equation 

\> Simulate market order counts. 
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MO,k<~LO,k 


Sample 

> Simulate market order sizes. 


gcv (; 


MO,k 


I{N, 


MO,k 


< 


-LO,fc) Equation 
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for i = 1... N. 


MO,k 


do 


f..MO,k 


Hi-) 


return L = {Li ,..., Lt} 


NF'^. ^MO,a^ Q/O.a^ Q^O.b^ QMO,a^ qMOX^ 


.N' 


1 
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literature, that allow one to still perform calibration of models, i.e. parameter estimation, for models specified 
in a simulation based format. 

The structure of our model ensures that we can capture features such as the non-linear dependencies between 
the activity at different LOB levels. This activity includes limit order submissions that can be passive or aggres¬ 
sive, cancellations and market orders, and can arise from two different classes of agents. Given this complexity, 
obtaining the distributional form of the likelihood will be impossible. We therefore propose estimating the model 
via a simulation-based method called Indirect Inference. In particular, we develop a novel extension to one of 
these classes of statistical simulation based likelihood inference procedures known as Indirect Inference. 


4.1 Background on Indirect Inference 

There is a substantial body of academic work related to simulation-based likelihood inference, and we focus on 


the subclass known as Indirect Inference, introduced by 

Smith 

I'l'MI 

1993 and 

Gourieroux et al. 

1993 and 

covered extensively in Gallant and Tauchen 

1996 , Gourieroux et al. 

2006 and the book length coverage in 


Gourieroux and Monfort 1997 . At its most fundamental level. Indirect Inference is a technique for parameter 


estimation in simulation based stochastic models. These are models for which one cannot evaluate the density 
for the data generating model, but for which one can generate data given a set of parameters. One can then 
compare the simulated data with the observed data, and obtain a measure of fitness for a set of parameters based 
on this comparison. 

To achieve this via Indirect Inference, one introduces a new model, called the ‘auxiliary model’, which is 
mis-specihed and typically not even generative, but can generally be estimated easily via for instance maximum 
likelihood estimation. This auxiliary model has its own parameter vector /3, with point estimator /3. These 
parameters of the auxiliary model describe aspects of the distributions of the observations. The idea of Indirect 
Inference is then to simply try to match aspects of the estimated auxiliary model parameters on the observed 
data y, given by /3(y), and the estimated auxiliary model parameters on the simulated data y*{6), which is 
obtained through simulation using parameters of the actual model 9, given by (3{y*{6)). 

One sees that Indirect Inference only requires that the model one wants to estimate can be simulated, and 
proceeds by fitting a simpler auxiliary model to both the simulated and the real data. Estimates of the model 
parameters are then obtained by minimising the difference between the parameter vectors of the auxiliary model 
fit to the simulated data and the real data. 

When considering the choice of an auxiliary model, the simplest form one may consider involves a comparison 
formed between a single summary statistic calculated on the real observed data, say y and also on the simulated 
synthetic data y*. Alternatively, one may consider methods that consider the use of a vector of summary 


auxiliary parameters, such as in Winker et al. 2007 who consider minimization of a weighted L2 quadratic error 


function between the real data vector of estimated moments and the synthetic simulated data equivalents. Others 


who have adopted such methods include McFadden 1989 and Fakes and Pollard 1989 who each proposed a 


modification of the method of moments estimator, called the Method of Simulated Moments (MSM). Other, 
alternative simulation-based estimation techniques include the simulated maximum likelihood (SML) and the 
method of simulated scores (MSS). Such techniques have been used in the estimation of a number of economic 
models, for example dynamic stochastic general equilibrium (DSGE) models Ruge-Murcia 2007] and Markov 


models of asset pricing DufRe and Singleton 1993 


In this paper, the auxiliary models we consider are based on aspects of the LOB stochastic process that we 
analyze. The key features we consider include the variation in the price and the volume resting in the LOB. In 
particular, we would like to capture the clustering of volatility in intra-day log returns and the dynamic behaviour 
of total volume in the first n levels of the LOB. 

In detail, the sequence for obtaining the Indirect Inference estimator is as follows: 
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1. Take the observed sequence of LOB states Xu and transform them to auxiliary model data set 
y = T [Li-t)- 

2. Using observed auxiliary model data y, estimate auxiliary model parameters (3 (y). 

3. Initialize parameter vector of stochastic agent LOB model, in our case liquidity provider and liquidity 
demander agent models parameters 0*-°^. Then simulate a synthetic realization of the LOB model 

from the stochastic agent model. 

4. Take the synthetic sequence of LOB states L*.rp and transform them to auxiliary model data set 

y*{e(°)) = r{Lir{e<-°^)). 

5. Using synthetic auxiliary model data y*{6^^'^), estimate auxiliary model parameters /3o {y* (0<">)). 

6 . Estimate Mahalanobis distance or Euclidean distances between auxiliary parameter vectors 

v[^iy)Joiy*{e(^^))) 

7. Set optimal parameter vector = 0^°^ with distance I?min = ^/3 (y ), f3o{y*{9^^''))^. 

8 . Repeat steps to with proposed parameter vector 9^^'^ until convergence or for J total iterations, with 
step (vii) applied conditionally on the event 




> V 


(p{y)3o{y* {0^^^))) 


Several theoretical properties are known about the estimators obtained from such a data generative procedure, 
see discussions in Smith 2008 and Genton and Ronchetti 2003 . Under certain assumptions it can be shown that 
the Indirect Inference procedure produces a point estimator of the model parameters which is both consistent 
and asymptotically Normal under fairly unrestrictive regularity conditions (Gourieroux and Monfort I997| ): 

1. The likelihood, which we maximise, in order to estimate the auxiliary model parameters /3, tends asymp¬ 
totically to a non-stochastic limit. 

2. This limit is continuous in the simulation model parameters 9. 

3. The so-called binding function linking the parameters of the auxiliary model to the parameters of the 
actual model we are trying to estimate is one-to-one and its derivative with respect to the auxiliary model 
parameters is of full column rank. 

In addition. Indirect Inference can be shown to be asymptotically efficient when the model is correctly specified 
for the observed data. 


4.2 Multi-objective Indirect Inference for simulation-based model calibration 

To perform estimation of our agent stochastic model, we develop a novel extension of simulation-based estimation 
procedures which combines two key ideas: simulation-based likelihood inference based on Indirect Inference, and 
multi-objective optimisation methods, typically utilised in genetic search algorithms. We denote the resulting 
class of estimation methods as Multi-objective-II. The proposed Multi-objective-II estimation framework, unlike 
standard indirect inference, is designed to allow one to utilise multiple auxiliary models, each capturing different 
features of the LOB stochastic process. In this sense, this is a multi-objective extension of standard Indirect 
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Inference procedures, which will naturally allow us to explore relevant features of the target stochastic process 
given by the LOB. 

To proceed with the specification of the multi-objective-II estimation methodology, in addition to the LOB 
simulation framework described in Section we need to specify 

• The auxiliary model(s), each parameterised by a set of parameter vectors, generically denoted by /3, which 
are determined according to the features of the observed data stochastic process we would like to approxi¬ 
mate with our model. 


• The objective function quantifying the difference in the auxiliary model(s) parameters fit to the real data 
(for which we will use the shorthand (3 to represent (3 (y)) and the auxiliary model(s) parameter fits to the 
synthetically generated data (where we will use the shorthand (3*{0) for (3{y*{9)) 

• The search method that will explore the parameter space of the stochastic agent-based model when per¬ 
forming simulation based optimization for stochastic agent LOB model calibration. 


4.2.1 The auxiliary models 


The auxiliary model(s), sometimes known as the estimating function(s), serve to capture aspects of the real data 
that we want reflected in our simulation, i.e. they do not necessarily have to correspond closely to the data 
generating process, but each should capture some relevant features that will inform estimation of the stochastic 
simulation model parameters. In standard Indirect Inference methods, there is only one auxiliary model utilised 
which usually comes from a relatively simple class of models, for guidelines relating to selection see IHeggland] 


and Frigessi 2004 


In our framework, for a given candidate parameter vector 9 we generate M realisations of trajectories of 
the LOB process, i.e. mefi 2 m}’ stochastic agent-based LOB model. Then for each 

auxiliary model, parameterised by some vector, generically denoted by /3, we utilise the simulated data to obtain 
estimates of the auxiliary model parameters, for instance via a maximum likelihood framework: 


M T 

j3* {9) = arg max EE iog(/(r(T^ 

m—1 i—1 


i9))\T{L:rd0)y,i3))- 


(16) 


In principle, one can adopt as many auxiliary models as is deemed desirable for a particular application. 
However, several authors have explored the effect of the number of objective functions K on the estimation 
performance under a multi-objective optimization framework. For instance, Purshouse and Fleming 2003 and 


Hughes 2005 


suggest that Pareto-ranking based methods, such as the one used in this paper, scale poorly with 

explains that an increase in the number of objectives may have a 


the number of objectives. Kdppen et al. 2005 


detrimental effect on the optimisation because the probability of dominance in a Pareto optimality based multi¬ 
objective framework will go to zero. A second issue with having a large number of objectives is the difficulty 
in comparing the results qualitatively, since in a task with K objectives, a set of solutions lies in a AT — I 
hyperspace. Based on this guidance, we focus on capturing two core features of LOB stochastic process, related 
to the evolution of the price and the properties of the volume resting near the top of the book. 

a,l I b,l 

Auxiliary Model 1 - Price features: If we denote the mid-price as ^then the log return is 

defined as 


n = In 


ry^mid 

Pt 

rrjnid 

Pt-At 
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where At is a suitable interval, in our case 1 minute. The timeseries of log returns for a typical day for an 
illustrative stock GDF Suez is presented in Figure 

This illustrative timeseries displays typical features of mid price dynamics, such as heteroskedasticity. The 
presence of ARCH effects was formally confirmed by an ARCH-LM test. Hence, the volatility 
at = \/Var(rt\rt-i, ■ ■.) is not constant, and can be captured with a generalised autoregressive conditionally 
heteroskedastic model, or GARCH(p,q) model, where with r* = atrjt and rit A^(0,1), we have for the squared 
volatility 

at = tto + airt_i + ... + aprl_p + 6icrt_i + ... + bqat_q 

where at > 0, bj > 0 for all i S {1,... ,p} and j S {1,..., q}. For simplicity of the auxiliary model we utilise a 
GARCH(1,1) model for this aspect of the data, parameterized by /3i = (ao,ai,&i). 

Auxiliary Model 2 - Volume features: In Figure we demonstrate an example of the volume on the bid 
and ask side for a typical day for stock GDF Suez. We ht an ARIMA model to this data, in order to capture the 
time series structure of the LOB volumes. We will err on the side of parsimony during model identification, as 
we would like to obtain an auxiliary model with few parameters in our Indirect Inference procedure. 

We first remove observed linear trends present in the LOB volume timeseries throughout the day by taking 
first differences, see FigureThe resulting sample ACF and PACF is given in Figure]^ and it indicates that an 
MA(1) model is appropriate. Hence, we ht an ARIMA(0,1,1) model to the volume data. 


4.2.2 Combining multi-objective optimisation and Indirect Inference 


Thus far, for a given set of parameters in our stochastic LOB agent model, we have simulated the order book 
process. This simulated data was then utilised to construct a framework in which we obtained multiple htted 
parameter vectors, one for each auxiliary model considered. We now need to consider how to judge the suitability 
of the model parameter vector in capturing the true observed LOB stochastic process dynamics. 

In standard Indirect Inference based frameworks, one would concatenate all the auxiliary model output 
parameter vector estimates into a single vector of auxiliary model parameters, in order to produce a single 
distance measure or discrepancy between the simulated data and actual data. This concatenation induces a loss 
in information, as for instance some auxiliary parameter model discrepancies may be on different scales to others. 
Therefore, if a naive concatenation is applied, this often results in domination of a select few criteria, rather than 
considering each component in its own right. 

We overcome this issue through the introduction of a multi-objective optimization framework. Such methods 
naturally adapt the simulation-based estimation to allow for competing criteria when assessing the suitability 
of the stochastic agent LOB model parameters via a collection of auxiliary model fits. The multi-objective 
optimisation method thus enables us to consider multiple distance measures, of discrepancy scores, as separate 
objective functions. 

In this framework, the fitness, or suitability, of a parameter vector 6 for the stochastic agent LOB model is 
measured by simulating from the generative model and quantifying the difference between each auxiliary model’s 
parameters. Each auxiliary model is fit to both the tranformation of the observed data to obtain (/3fe) and to the 
transformation of the simulated LOB data {j3l {9)), for which a discrepancy score is calculated by measuring the 
distance between the two. The most commonly utilised distance measures are based on some form of weighted or 
unweighted norm, such as the Lp-norm, or Minkowski distance of order p, of which the Loo-norm, the Li-norm 
and the L 2 -norm are frequently used in practice. We adopted the L 2 -norm to measure discrepancies for both the 
price and volume-based auxiliary models we considered, generically given for the fc-th auxiliary model by 


Vk{e) = v 


<lk 


2=1 


Pk 



(17) 
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for each -dimensional auxiliary model, k = 1,... ,K. 


4.2.3 Multi-objective optimisation and the role of Pareto optimality 

When our search is for an optimal parameter vector 6 that should satisfy multiple objective functions, in a vector 
X>(0) := ..., (0)] to be minimised, there are many cases where there will not be a global minimum with 

respect to each individual objective. In this case, one can consider as an alternative to the single optimal value 
produced by an optimisation method, the notion of Pareto optimality, in reference to the Pareto efficient frontier. 
Informally, this is the search for solutions such that there is no solution in the search space that can unilaterally 
improve a single criterion (objective function) without worsening another criterion, and this is formally defined 
in Definition ^ for the case of our estimation framework. 

Definition 4 (Pareto Optimal Dominance of Parameter Solutions). Consider the set of K auxiliary models 
producing parameter vectors i k}’ based on an underlying parameter vector 6 G fl, that produce, 

for selected objective functions, the values T>{9) := [T)i{6),... ,'Dx{0)]. Then the selection of 6 G fl is called 
Pareto-optimal or (non-dominated) with respect to the set of solutions in the feasible region D, if 

$e Gn s.t v{e) ^v{e), (is) 

where we say that T>{9) dominates 'D{9), denoted by T>{9) -< T^{9), if 

Vk{9) <Vk{9) ykG {l,2,...,K) and 3k s.t. Vk{9) <Vk{9). (19) 


From this, we can then state the overall objective, incorporating all K auxiliary models and a common selection 
of L2-norm objective functions for the parameter vector 9 of the stochastic agent-based model as follows 

9 = argmin \Di{9),.. .,'Dk{9)\ 


arg min {x> (/§!, (e)^ ,..., , /3^ (©)) | 

' 2 


= arg mm ■ 
ogQ 


subject to 01 ^ < 6»i < ,..., < 6»„ < 


Pk W 


( 20 ) 


where , 0%], for all i, which denote the boundaries of the feasible region D. 

To complete the specihcation of the multi-objective Indirect Inference simulation based estimation framework 
we propose, we require a method to search the constrained parameter space D for feasible and Pareto optimal 
solutions. A variety of stochastic search methods are available for use in this context, see discussion in |Coello| 


et al. 2007 


We propose the use of an evolutionary genetic search method for this purpose, known in the literature as Multi- 
Objective Evolutionary Algorithms (MOEAs). We develop a version of such a stochastic search framework which 
combines the widely utilised NSGA-II genetic search algorithm by Deb et al. 2002 , which is a Pareto-ranking 
based method, with an additional mutation kernel we designed specifically for a covariance matrix mutation 
operator, based on the framework developed in Peters et al. 2012 . This additional mutation component is 


combined with the framework of NSGA-II, to ensure that the proposed covariance matrices in the stochastic 
agent LOB model, which are proposed at each step of the search, remain positive definite and symmetric. Details 
of this genetic search algorithm are provided in Appendix 
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5 Stochastic agent LOB model assessment and calibration to real 
LOB data 

We have provided a description of the stochastic agent-based LOB model we developed for modelling trading 
interactions and their dependency. In addition, we have developed a method for the calibration of model param¬ 
eters to observed LOB data. In this section, we illustrate the results of this calibration on real data, through a 
sequence of studies which aim to practically assess the importance of each component of the stochastic agent LOB 
model specification. To achieve this, we make a number of model simplifications and progressively relax these 
simplifying assumptions, in order to provide an understanding of the role each feature of our proposed model 
plays in the simulation framework. The reference model is the basic framework against which we compare the 
more detailed versions of the model, as detailed below. 


5.1 Developing a baseline simplified reference stochastic agent LOB model 

In the stochastic agent-based LOB model, the liquidity provider agent has limit order submission and cancellation 
components which each require the specification of four independent It-dimensional multivariate skew-t distribu¬ 
tions for the bid and ask sides, with Ip = 5 ‘passive’ levels and Id = 3 ‘direct’, or aggressive levels for a total of 
It = 8 actively modelled levels for each side of the book. For each of these stochastic model components we require 
the estimation of the parameters: m G M'’*, the location for the mean intensity vector; 7 G M'’*, the skewness of 
the stochastic intensity vector; it G M"*" which directly influences the heavy-tailedness of the stochastic intensity 
vector and S G the covariance matrix of the stochastic intensity vector for order arrivals. We consider 

aggregate activity in 10 second intervals, and for the 8.5 hour trading days for the asset under consideration 
here, we have T = 3060 intervals in the day. The basic reference model is characterised by the following model 
assumptions: 


• We assume that the associated limit order submission distributions for the bid and ask have common 
parameter value settings. In addition, market order submission distributions for the bid and ask are 
also assumed to have common parameter value settings. This is reasonably consistent with empirical 
observations for a large number of assets when observing the submission activity on either side of the LOB 
throughout the trading day. 

• Since the vast majority of orders get cancelled prior to execution, we consider the parameters of the 
distribution of cancellations to also match the distribution of limit order placements. 


• We also set m = 0 and consider the skewness vector, 7 , to take a common value in all levels of the bid and 
ask such that 7 = 701 , where 1 is a vector of ones. 

• The monotonic mapping F(-), transforming the random variables pMO,fe intensity ran¬ 

dom variables \LO,k,s^ i^c,k,s^ ^MO,k is set as the CDF of the standard Normal. This transformation is 
necessary in order to ensure that intensities are positive, and to bound the event counts. 


For the baseline intensities of limit order activity at each level, we assume that they will be the same for 

ii ( * ’ 1' 'j. j "U j-i., ’J ■ LO,ci,l LO,ci,lp LO^b.X LO,b,lp LO^p 

the passive limit orders on both sides, i.e. /Xq = ... = fi^ ’ ’ ^ = ,,, = ^ 

LO .CL, — / ff-j-l 

• = Mo = 


while ‘aggressive’ limit orders will have a different limit order intensity, i.e. /ig 


LO,b,0 






Mo ~ “ ^0 

MO,a MO,b MO 
Mo = Mo = Mo ■ 

activity. 


= Market order baseline intensities are also equal on either side, i.e. 

The cancellation baseline activity will be the same as the submission baseline 
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• Finally, we assume constant order sizes, i.e. = c = for all i e |l,...,j G 

fee {a,6}, SG {-Zd + l,...,W and tG {!,...,T}. 

Hence, the basic reference model has the following parameter vector 7 o, as well as 

the covariance matrix E to be estimated. 

The cancellations are modelled by a dynamically evolving volume process, i.e. the Cox process is trun- 

f LO,k,s ^ 

cated to the available number of orders at each level, as specified in the model by N^’ = u > ^ 

< v) where we denote by the volume at level Li at the start of the [t — l,t) interval 

^ LO ,k,s 

and is the volume available after the arrival of the limit orders at time t, but before the cancellations 

and executions. One can simulate from the model, in order to obtain the state of the LOB at time t, LJ", and 
thus the available volume v, so that one can then draw from a truncated Poisson distribution with a truncation 
limit of V. 

Before we begin the study of the stochastic agent-based LOB model and its calibration and simulation 
behaviour, we first show for a representative trading day, the evolution of the spread, as well as the intensity of 
the volume process around the top of the book, for one of the most liquid stocks in the CAC40, namely BNP 
Paribas, in Figurej^ This provides an illustration of the LOB dynamics we should aim to recover with the model 
once accurately calibrated. We estimate the model on the data from this day, as an illustration of the calibration 
procedure. 

5.2 Reference model: Calibration 

We present in Table the results of the estimation using the multi-objective B approach proposed in this paper. 
There are 8 non-dominated solutions spread out accross the Pareto optimal front, each of which also has an 
associated covariance matrix, which has not been included here due to space considerations, instead we provide 
the trace as a summary. In the table, we also present a further 4 solutions with a non-domination rank of 2, i.e. 
parameter vectors which were dominated in both objective functions by only one other parameter vector. We 
present the non-domination rank, as well as the objective function values of the entire final parameter population 
in Figure We note that in terms of the 2 objective function values associated with these parameter vectors, 
these are spread out across the Pareto front. 

We assess the fit by a qualitative comparison of the simulations produced with the estimated parameters. In 
Figurej^we present, for the first 2 Pareto optimal solutions of the parameter vectors in Tablej^ summaries of the 
price process for repeated simulations, as well as an example of the LOB evolution throughout the day. We see 
that the two Pareto optimal solution parameter vectors produce a broad variety of different price trajectories over 
repeated simulations. In particular, some points on the Pareto front of solutions for this basic reference model 
produce a time series of simulated prices which replicates a trading day with relatively volatile trade activity, 
whilst other points on the Pareto front favour more constrained trading simulated price activities. To understand 
how this may occur, we note that this is likely to be due to the relatively high baseline rate of market orders 
compared to baseline limit order rates in the first set of Pareto optimal solutions, compared to the second. 

In Appendix we provide further calibration results for the reference model, for multiple assets, over an 
extended period of 15 trading days. Summarising these results, we show that within the set of solutions produced 
by our estimation procedure, there is very commonly a subset which produce simulations which are similar to 
real trading observations in terms of their price and volume behaviour, which are the summaries of the LOB 
which our auxiliary models related to. 
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5.3 Relaxing assumptions of the reference stochastic agent LOB model 

The baseline model results are encouraging, however we still need to determine what influence the simplifying 
statistical model assumptions made in the reference model specification have on the calibration performance. This 
will now be assessed by progressively relaxing the assumptions and making less restrictive model assumptions. 
Our criterion for improvement relative to the reference model will be a reduction in the values of the objective 
functions of the solutions on the Pareto optimal front. We will only suggest that particular features should be 
relaxed if we observe such an improvement. 


5.3.1 Incorporating an order size distribntion 


In our basic reference model, we assumed that orders sizes are constant, i.e. all limit order submissions, cancel¬ 


lations and executions were from an equal number of shares. This is similar to the model of Cont et al. 2010 


which assumed that all orders are of unit size, which they set to correspond to the average size of limit orders 
observed for the asset. Abstracting away the order size aspect is an approximation one can make in order to 
simplify the model. However, such a simplifying assumption is not likely to be supported by the data, as we 
illustrate in Figure Clearly, one observes that there is a range of distribution shapes for the order sizes of 
different assets. 

It is clear that the distribution of order sizes will be affected by features such as minimum order sizes on an 
exchange (in number of shares, lots, or weight, depending on what is being traded). We observe empirically that 
for a range of equities traded in a number of countries, the distribution of order sizes has clear peaks at round 
figures - see Figurej^for evidence of clustering order volumes at multiples of 100 shares, for example. This seems 
to be independent of the level at which they are submitted, whether it is a buy or a sell order, as well as the 
intensity of the order submissions in that period. 

Therefore, we present a case study where we relax the assumption of a fixed order size, by considering instead 
a stochastic model where we assume that the order size is drawn from a mixture of distributions. In this case, we 
assume that both the limit and market order sizes are obtained by sampling from the following Gamma mixture 


O. 


LO,k,s 

i.t 


w Gamma(Ki, 9i) -I- (1 — w) Gamma{K 2 , 6 * 2 ); Vi, t, k, s 


( 21 ) 


where 


Gamma{0; k,9) = 


1 


-O'^-^exp 


O 


O e 


( 22 ) 


r(K, 9 '^) 

with positive shape parameters ki, K 2 and positive scale parameters 0i, 02- We set ki = 1, K 2 = 2 as we observed 
there was a mode present in the empirical distributions of order sizes and we estimated the scale parameters for 
each mixture component to place the mode in the appropriate locations. Hence, we additionally estimate the 
parameters 0 i ,02 and the mixture weight w. 

We run the stochastic optimization framework using the same settings (a parameter population of 40 candidate 
solutions and an evolution over 40 generations) and calibrate the relaxed reference model with the stochastic 
model for the order sizes to the same data set used in the reference model ht, i.e. the LOB data for BNP Paribas 
over an entire day. We obtain a Pareto optimal front which again contained multiple parameter vector solutions 
which were spread out over the Pareto front, indicating a successful exploratory search by the genetic search 
framework. Importantly, as shown in Figure]^ we observe the realized objective function values for the relaxed 
reference model, which we observe are clear improvements on the objectives achieved by the comparison basic 
reference model case in which the order sizes were fixed. 


Figure 10 shows the intensity of the volume process and the evolution of the spread for a simulated trading 
day for 2 of these parameter vectors selected from the Pareto optimal front. Similarly to the reference model. 
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the price and volume trajectories are still quite flexible between the different feasible, Pareto optimal solutions 
obtained for this calibration. 

5.3.2 Introducing asymmetry and skewness to Limit Order intensity by depth 

In the reference model, we assumed that the skewness parameter vector 7 for the multivariate skew-t distribution 
assumed for the number of limit orders and cancellations at each level of the LOB were fixed to a common skew. 
This parsimonious choice was encoded in the model by the reference model assumption ~ ^ 

and 7 *^*^ = 79 , i.e. there was only one skewness parameter which was common to all levels on both the bid and 
ask. The effect of this assumption on the price and volume dynamics in the reference model is now assessed by 
relaxing this feature and performing calibration of a relaxed version of the reference model to the same day of 
data from BNP Paribas. 

We now allow 7 ^'^’“ = ^ | ^LO,-id+i^ _ ^^LO,ip | _ ^C,a _ jC,b^ order to gain additional 

flexibility in modelling the skewness in the multivariate counts for limit order and cancellation data. We also 
allow 7 '^'^’“ = ^^ 0 ,b _ ryMO enable ibe skewness of the market order data to be modelled separately. This 
will entail estimating an additional Id + Ip parameters. Again, we assess whether the Pareto optimal solutions 
improve in minimizing the objective functions under this relaxation of the constraints in the reference model 
assumptions. 

Tablej^shows that in none of the parameter vectors produced by the multi-objective II estimation method are 
the elements of the skewness vector close to being equal to one another, which indicates that the use of the skew 
vectors with different skew at each level of the LOB for the bid and ask, in the Multivariate Skew-t distribution, 
is appropriate for the calibration to real data. As expected, incorporating these features improves the model 
power and suitability, measured by the objective function values achieved by the solutions in the Pareto optimal 
front, for the simulated stochastic agent LOB model realizations, when compared to the reference model. 


6 Regulatory interventions via the SR-ABM stochastic LOB agent 
model 

In building our stochastic agent-based LOB simulation model, we were motivated by the increasing desire of 
regulators, exchanges and brokerage houses to better understand the role of intervention in electronic exchanges. 
In this regard, there have been a sequence of new regulations being instigated throughout Europe and the US to 
further manage the processing, placement and clearing of trades in electronic exchange^ 

The the Markets in Financial Instruments Directive (MiFID) aims to develop a harmonised regulation for 
investment services across the 31 member states of the European Economic Area. Several components of MiFID 
can be better understood by the type of analysis we undertake in this paper. For instance, one aspect pertaining 
to the brokerage hoses involves the key aspect of this directive known generically as ‘Best Execution’ practic^ 
Under this feature of the directive, MiFID will require that firms take all reasonable steps to obtain the best 
possible result in the execution of an order for a client. The best possible result is not limited to execution price 
but also includes cost, speed, likelihood of execution and likelihood of settlement and any other factors deemed 

^These regulations, which in Europe fall under the ‘Lamfalussy Directives’ include the Prospectus Directive, the Market Abuse 
Directive, the Transparency Directive and the Markets in Financial Instruments Directive (MiFID) 

^MiFIDs best execution regime is set out as follows in the Directives. Article 21 of Level 1 and Articles 44 and 46 of Level 2 
set out the requirements for investment firms that provide the service of executing orders on behalf of clients for MiFID financial 
instruments and, indirectly via Article 45(7), for investment firms that provide the service of portfolio management, when executing 
decisions to deal on behalf of client portfolios. 
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relevant. As is clearly evident, this directive therefore speaks directly to liquidity in the LOB and the need to 
develop a better understanding of which features and market behaviours by agents in the market affect liquidity 
either in volume or price. An intrinsic part of this process is the consideration of volumes at different levels of 
the LOB. 

In addition to developing a better understanding of the LOB stochastic process, regulators also have an 
important role to play in trying to determine how best to manage certain types of potentially undesirable market 
behaviours by agents. In this regard, we refer to behaviours that may be disruptive, cause excess volatility in 
price or illiquidity throughout the trading day in given asset’s LOB. 


6.1 Related ABM studies of regulatory interventions 


The introduction of MiFID has increased competition and allowed for the trading of stocks in pan-European 
multilateral trading facilities (MTFs). The trading on one venue will undoubtedly affect the trading interest in 
another, through the activity of cross-market arbitrageurs. In addition, there is the possibility that regulation 
can be imposed on one market, but not another, which will have implications for the efficacy of the regulation 
itself. Both Mannaro et al. 2008 and Westerhoff and Died 2006 have considered this in the context of an ABM, 


but with simpler models than the one considered here, which do not take into account the liquidity considerations 
of the agents. We extend these studies using the stochastic agent-based LOB simulation model developed in this 
manuscript. 

set up to study the effect of a transaction tax in a financial 


In contrast to the ABM model Westerhoff 2003 


market, in our model the agents’ strategy is not dependent on profitability. This is because of the division of our 
trading agents according to their liquidity considerations: Traders often consume liquidity due to considerations 
other than profit, such as rebalancing the weights of their holdings in a fund. They cannot simply choose 
to become liquidity providers because of the superior prohtability of these agents, for a number of reasons. 
These include the investment in technology required to be able to carry out such a strategy in the millisecond 
environment, the inventory they will be required to hold, and, possibly, regulatory or exchange obligations they 
will have to adhere to. 


6.2 Quote-to-trade ratio 


The intervention we will consider here, as an example of the type of experiment that can be performed using our 
model, is the imposition of a quote-to-trade ratio. This ratio is already considered in certain exchanges, such 
as the LSE, which allows for 500 quotes per trade. Further quotes are allowed in the case of the LSE, but are 
subject to a 5 pence surcharge for every ordeij^ In our model, we have made the assumption that the baseline 


limit order submission (or quote) intensity at every level is equal to the baseline cancellation intensity 

fiQ That is, potentially all orders submitted in an interval can be cancelled prior to execution. 

Given the setup of our model, it is more convenient to enforce a stochastic limitation for excessive trading, 
rather than a hard limit of (say) 100 limit orders to 1 market order. For a quote-to-trade ratio q = ^, we impose 
the limit by specifying that for the cancellation activity 

who, rather than enforcing a strict minimum resting 


= (1 — This is an approach similar in 

concept to that taken by A'it-Sahalia and Saglam 20 
time of 500 milliseconds, instead subject every order to a random minimum resting time that is exponentially 
distributed, but with the same mean. 


®http://www.londonstockexchange.com/products-and-services/trading-services/prlcespolicies/ 
tradingservicespricelisteffective2december2013.pdf 
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We evaluate the outcome o f su ch an intervention in our simulated LOB for 3 different quote-to-trade ratios, 
i.e. g G x}- Figure 13 shows the effect of the regulation on individual realisations of daily activity, 

as well as the price process in repeated realisations. We have chosen one of the parameter vectors from the 
estimation of the basic model which generally showed excessive volatility. We note that, in our model, increasing 
q (and thus, reducing the relative number of cancellations) has the effect of constraining the mid-price process, 
and thus, curbing excess volatility. 

While one cannot draw definite conclusions about the effect of such an intervention through an ABM sim¬ 
ulation, it is a step a regulator may consider, particularly when comparing different approaches. For example, 
even in the implementation of a quote-to-trade ratio, the regulator may have a number of choices, for example, 
regarding the period over which they consider the ratio. We argue that our model can be informative for such 
considerations, and, given its flexibility, can give rise to a large number of computational experiments and sce¬ 
nario analysis studies that will better inform policy makers of the impact their policies may have on the market 
behaviours of traders. 


7 Conclusion 


We have presented a new form of agent-based model, in order to capture features of the complex stochastic 
process that is the Limit Order Book. The agent types we considered are representative of the classes of market 
participants in modern financial markets: In electronic LOBs, traders can be broadly separated according to their 
liquidity requirements, into liquidity providers and liquidity demanders. This is certainly more representative 
of the motivation for trading activity, compared to the chartist and fundamentalist models considered in the 


past (e.g. 

Farmer and Joshi |2002 

Westerhoff and Reitz 

20031). 


Manzan and Westerhoff 2007 


We have modelled the activity resulting from the entire class of agents, which has enabled us to directly model 
the dependence in event (limit order submission, cancellation and market order) activity between the different 
levels of the LOB, which would not have been possible by considering simpler formulations for individual agent 
strategies. We have employed a flexible Multivariate Skew-t model for the event intensities, which is unique for 


its ability to capture asymmetric and heterogeneous dependence, and its scalability in high dimensions Demarta 


and McNeil, 2005 Fung and Seneta 2010 . This has resulted in a very general formulation of the ABM, which 


also enables one to model the heterogeneity in order sizes. 

In our estimation of the model, we proposed an extension to standard simulation-based approaches, considering 
multiple auxiliary models (relating to the price and volume processes) in a multi-objective problem. We developed 
a novel Indirect Inference multi-objective optimisation method which uses the concepts of stochastic ordering and 
Pareto optimality to select most suitable candidate parameter vector solutions when calibrating the stochastic 
agent LOB model. 

We have shown that even a parsimonious, baseline version of the model, which assumes fixed order sizes and 
no heterogeneity in the skewness of the distribution intensities for limit order placements and cancellations, is 
still able to generate produce a range of plausible LOB stochastic dynamic behaviours. Relaxing the baseline 
model assumptions, however, generally leads to an improvement in the model estimates, in terms of their ability 
to produce simulations that closely reflect the price and volume dynamics observed in real data on a typical, i.e. 
non-eventful in terms of shocks or liquidity droughts, trading day. 

The flexible LOB framework presented here, coupled with the proposal of a new simulation-based estimation 
method is an important contribution towards LOB modelling. We have proposed a model that can capture rarely 
studied LOB features, such as the dependence in the intensity of LOB activity at different levels. In addition, we 
have shown that the model is sufficiently detailed, such that one can use it as a testbed for proposed regulation. 
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We demonstrated that a sufficiently high stochastic limitation on the number of cancellations, which would be 
similar to the imposition of a quote-to-trade ratio, can reduce excessive volatility in our simulated LOB. 
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A Adaptive genetic evolutionary search for multi-objective optimi¬ 
sation 


A search strategy is also required to explore the parameter space in seeking Pareto optimal sets of parameters 
for the agents, i.e. liquidity provider and liquidity demander parameter vectors in the stochastic LOB model. In 
this regard, one may consider a multi-objective evolutionary algorithm (MOEA) framework. Such approaches 
have been the focus of extensive study over the past 15 years (see, e.g. Zhou et al. 2011 , Eiben and Smith 


2003 , and references within) and would be particularly applicable to the problem at hand. There are several 


reasons for their popularity: they are inherently parallel, they feature operators to combine and mutate candidate 
solutions to rapidly arrive at improved solutions and are able to capture multiple Pareto-optimal solutions during 
the optimisation Zitzler et al. 2000| , which can be spread out across the Pareto front. In addition, there has 
been recent advances to better understand the relationship between such optimisation search frameworks and 
stochastic genetic search methods, see for instance discussions in Emmerich et al. 2013 . In this paper, we explore 


the utilisation of adaptive mutation kernels in the simulation based Multi-objective-II framework to efficiently 
explore the parameter space, where our approach merges traditional genetic search algorithms with adaptive 
Markov kernels utilised in adaptive MCMC methods, such as those studied in Haario et al. 2006 , Roberts and 


Rosenthal 2009 and Andrieu et al. 2006 


The MOEA used in this paper is based on the NSGA-II (Non-dominated Sorting Genetic Algorithm II), 

This is an elitist MOEA, and in every iteration, combines the best parent 


developed by Deb et al. 2002 


solutions with the best offspring to produce a new family of candidate solutions. It produces a diverse Pareto- 
optimal front (i.e. the solutions are well-spread out across the front, due to the algorithm’s use of a crowding 
distance operator) with low computational requirements (O(mV^) computational complexity, where m is the 
number of objectives, and N is the population size). 

The algorithm is perhaps the most popular MOEA and is frequently used as a performance benchmark for 
other algorithms Coello et al., 2007 . It has been used in various applications, including the generation expansion 


planning problem in power systems Kannan et al. 2009| and for balancing objectives in groundwater monitoring 


designs Reed and Minsker 2004 . In addition, it has been been further developed in a Bayesian setting, in order 


and Ocenasek 


Khan 

2003 , 

Khan et al. 

2002 , 

Laumanns 


to solve discrete multi-objective decomposable problems (see, e.g. 

2002| ). Within this algorithm, we extend the features by also incorporating an adaptive global 


and local mutation kernel for a subset of the stochastic agent-based LOB model parameters 0 . We first present 
an overview of the optimisation algorithm structure: 


1. First, a family, or population, of N candidate solutions is initialised randomly from the feasible region. 

2. For each solution, the objective functions are calculated and a rank is obtained reflecting Pareto dominance. 
That is, solutions are sorted into fronts, with the first front consisting of solutions that are not dominated 
by any other solutions, the second consisting of solutions that are only dominated by a single solution, and 
so on. Solutions are also assigned a crowding distance value, indicating the Euclidean distance from other 
solutions on the same front. 


3. From this family of solutions, the crowding comparison operator is applied, and chooses the best solutions 
according to their rank, and in the case of ties, according to the crowding distance value. 

4. Then, one or more evolutionary operators (detailed in the following section) are applied to evolve the 
selected set of solutions. 


5. The new solutions are combined with the current family of solutions and the process is repeated from the 
second step, for a set number of iterations. 
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The algorithm outputs the non-dominated set of solutions with the highest ranking. We provide details about 
the operators used in multi-objective Indirect Inference procedure in the following section. 


A.l Algorithm settings and evolntionary operators 


Details of a large number of evolutionary operators used in MOEAs can be found in Coello et al. 2007 . In NSGA- 
II, one has to first select the size of the population of candidate solutions for every iteration of the algorithm, in 
addition to the number of iterations (called generations in the MOEA nomenclature). In our optimisation, we 
use a population size of A^ = 40 parameter sets, and run the optimisation for a total of 40 generations. 

We referred to a number of operators used to evolve and choose amongst the set of solutions, and we provide 
further information here about their function: 


• Selection operator: From the second iteration of the algorithm onwards, there will be 2N sets of candidate 
solutions in step 3. The best N solutions are chosen based on a) dominance and b) crowding distance, or 
the distance of the solution from its neighbours. If the number of solutions on the first front is less than N, 
they are all selected, and the remainder are taken from further fronts. In the case where one must select 
fewer solutions than the number of solutions on a particular front, the solutions with the highest crowding 
distance value are chosen. 


• Crossover operator: The Simulated Binary Crossover (SBX) operator is used. From two candidate 
solutions 6 * 1 , 6 * 2 , two new solutions 9[, are formed, where the k-th elements are as follows: 


“ 2 ~ Q;fc)^i,fc + (1 + ckfe)^2,fc] 

(23) 

*2,fe = 2 Ckfc)^l,fc + (1 — Oik)d2,k] 

(24) 


Here, ak is a random sample from a distribution with density 

if 0 < a < 1 


We use the crossover operator with probability 0.7 and a distribution index rjc = 5. 

• Mutation operator: The polynomial mutation operator is used. The mutation operator perturbs elements 
of the solution, according to the distance from the boundaries. 


S'k = (^k + Sk{0ku — Ok^) 


where we have for 6^ 


^k 


(27fc)’j™+i-l if 7fc < 0.5 
1 - [2(1 - 7 fc)] 1rr. + l if 7 fc > 0.5 


Here, jk is uniformly distributed on (0,1) and the distribution index rj^ 
operator is used with probability 0 . 2 . 


10. The polynomial mutation 
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Covariance matrix mutation and sampling: The NSGA-II algorithm discussed above is only able to produce 
binary, integer, or real encodings for the output solution vectors. However, the stochastic process for the limit 
order submission activity by liquidity providers requires the specihcation of a positive definite and symmetric 
covariance matrix for the generation of intensities from a multivariate skew-t distribution. We cannot naively 
extend the evolutionary operators above (crossover and mutation) to produce new sets of covariance matrix 
candidate solutions which guarantee that the positive definiteness and symmetry constraints of the covariance 
matrix are preserved. We thus propose an extension to the MOEA, effectively another operator that will generate 
candidate solutions for the covariance matrices, such that every new generation remains in the manifold of positive 
definite matrices. This operator will generate new candidate covariance matrices once the evolutionary operators 
discussed previously have been applied. 

To ensure that the optimisation algorithm searches the space of feasible solutions efficiently and does not 
get stuck in a suboptimal region of the space of possible solutions, our covariance matrix sampling operator has 
two components to undertake exloration and exploitation type moves. The mutation kernel is comprised of a 


mixture of Inverse Wishart distributions with different parameters, as per the proposal of Peters et al. 2012 


one mixture component to provide global search (exploration) and a second mixture component to provide local 
searches (exploitation). To do this efficiently, it is based on an adaptive learning strategy for the specification of 
the local mixture component. In this case, the algorithm will explore the local region with high probability, but 
make potentially larger moves with smaller probability. 

We now describe one complete covariance mutation step. In the n-th generation of the MOEA, we generate 
,i = 1... N from a mixture distribution q{Y,n i) defined as follows: 


g(S„,i) = (1 - wi)IW{'^n,Pi) + wiT>V(T,P2) 


where pi,P 2 are degrees of freedom parameters with p 2 < Pi, and where wi is small so that sampling from the 
second distribution happens infrequently. Here denotes an uninformative positive definite matrix, with the 
effect that sampling from the second distribution leads to moves away from the local region being explored. is 
also a positive definite matrix, fitted based on moment matching to the sample mean of the successfully proposed 
candidate solutions in the previous stage of the Multi-Objective optimisation as follows: 


= 






N 


A/ 1 
A^i—1 r+ i 


E-E 


n 


where rt^i is the non-domination rank of the i-th solution in the t-th generation, and w* with rc < 1 is an 
exponential weighting factor. 


B Further results 

In Sections |5.2| and |5.2| we presented results for the calibration of the reference model and models where we 
relaxed certain assumptions, respectively. This calibration was performed using the data from a single asset 
(BNP Paribas) over one day, in order to be able to present detailed results regarding objective function values, 
LOB evolution over individual simulations using individual solutions on the Pareto front, as well as summaries of 
repeated simulations. In this section, we repeat the calibration of the reference model for 5 assets (BNP Paribas, 
Credit Agricole, Total SA, Technip SA and Sanofi) every trading day between 01/02/2012 and 21/02/2012. 
The stocks were chosen from the French CAC40 stocks, and are therefore amongst the most liquid stocks in 
the country. Specifically, we chose assets that are representative of different industries (banking, energy and 
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pharmaceutical) and have different ticksizes (minimum price increments) and market capitalisations, as these are 
factors that affect daily trading activity. 

We summarise the results as follows: We first calibrate the reference model for each day and each asset 
individually, from which we obtain a set of J solutions (i.e. non-dominated solutions on the Pareto front) every 
time. For each solution (parameter vector Oj^j € 1... J), we simulate the LOB model 7V=50 times and fit the 
auxiliary models to the simulated data to obtain N auxiliary model parameter vectors and i G 1... N. 
The former are the ARIMA model parameters fit to the volume process on the bid and ask side, and the latter 
are the GARCH model parameters fit to the log returns. 

We can then construct the empirical distribution for each parameter in these vectors, and determine the 95% 
confidence interval. From this, we can determine whether the parameter coefficients of the auxiliary model fit to 
the real data lie within this range, for each asset on the Pareto front. In Figures 14 and 15 we show for each 
day, each asset and each auxiliary model parameter, the proportion of solutions on the Pareto front for which 
the coefficients of the auxiliary model fit to the real data lie within the 95% confidence interval of the coefficients 
of the auxiliary model fit to the simulated data. 

We note that the proportion varies over time, as one would expect, as not all solutions on the Pareto front will 
give rise to LOB dynamics that closely reflect those observed in real data. However, we note that this proportion 
is generally more than 25% for most parameters and most days. Thus, within the set of solutions produced by our 
estimation procedure, there is a subset which produce simulations which are similar to real trading observations 
in terms of their price and volume behaviour, which are the summaries of the LOB which our auxiliary models 
related to. 
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Tables 


Table 1: Non-dominated solutions after 40 iterations, with a population size of 40. 



LO.p 

LO,d 

Mo 


7o 

vq 

MO 

'^0 

Tr(S) 

1 

30.84 

8.16 

4.75 

-0.18 

33.70 

1.78 

7.11 

2 

31.16 

5.13 

4.41 

9.96 

28.07 

9.95 

4.60 

3 

31.16 

5.13 

4.41 

9.96 

21.74 

9.95 

5.52 

4 

29.82 

5.24 

4.45 

-0.52 

20.57 

4.70 

4.26 

5 

46.87 

7.42 

4.77 

0.64 

28.34 

8.81 

6.82 

6 

22.05 

8.18 

8.13 

-1.68 

24.65 

1.83 

5.70 

7 

19.83 

5.41 

0.68 

-0.28 

28.85 

2.15 

5.25 

8 

12.95 

3.12 

2.93 

2.35 

35.25 

3.58 

6.01 

9 

30.84 

8.16 

4.75 

-0.18 

33.70 

1.78 

5.14 

10 

31.16 

5.13 

4.41 

9.96 

28.07 

9.95 

5.20 

11 

31.16 

5.13 

4.41 

9.96 

21.74 

9.95 

7.30 

12 

29.82 

5.24 

4.45 

-0.52 

20.57 

4.70 

4.32 


Table 2: Non-dominated solutions for the model where the elements of the skewness vector are allowed to vary. 




LO,d 

/^O 



Co 

^MO 

c 

to 

7o 

LO,0 

7o 

LO,l 

7o 

L0.2 

7o ' 

LO,3 

7o 

LOA 

7o 

LOA 

7o 

Tr { E ) 

1 

39.35 

4.00 

0.54 

- 7.36 

46.24 

7.89 

4.32 

- 7.30 

1.89 

- 4.49 

- 7.86 

4.51 

4.78 

- 6.72 

7.97 

2 

38.48 

3.98 

5.81 

- 1.35 

8.63 

8.21 

7.41 

4.35 

7.47 

- 6.86 

- 2.29 

1.16 

4.74 

- 5.77 

5.82 

3 

39.54 

3.39 

0.54 

- 6.46 

46.24 

7.89 

4.32 

- 7.30 

3.13 

- 4.49 

- 7.86 

2.67 

4.78 

- 6.72 

6.41 

4 

11.33 

2.56 

2.53 

6.90 

3.14 

1.98 

- 3.32 

- 3.55 

- 8.42 

- 3.32 

- 5.53 

- 4.10 

- 4.58 

4.40 

5.53 

5 

37.61 

3.98 

1.16 

- 1.35 

2.59 

8.21 

4.40 

- 7.50 

- 6.64 

- 7.61 

- 7.56 

1.20 

4.78 

- 5.11 

7.41 

6 

18.25 

4.00 

1.05 

- 2.34 

2.52 

8.21 

- 0.05 

- 8.93 

- 3.35 

- 7.37 

- 7.43 

3.67 

2.81 

- 2.61 

5.67 

7 

13.40 

5.97 

6.14 

- 1.25 

22.02 

8.69 

6.11 

- 1.65 

- 6.36 

- 8.16 

- 2.75 

3.34 

8.76 

6.81 

6.42 

8 

39.35 

4.00 

0.71 

- 6.44 

5.67 

2.25 

- 3.25 

- 7.34 

1.89 

- 4.47 

- 7.86 

4.51 

4.78 

- 7.18 

4.42 
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Figure 1: The actively modelled levels of the LOB in the agent-based model presented in this paper. There are a 
total of It levels on each side, where Ip are passive levels and Id are direct, or aggressive levels (i.e. would lead to 
immediate execution). The levels of the ask are considered around the best bid price at the start of each interval, 
and likewise the levels of the bid side are considered around the best ask side at the start of each interval. In 
this figure, as in our model, we have /p = 5 and Id = 3. 
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Figure 2: Correlation in the LOB limit order submission intensities on the bid side of the LOB in 10 second 
intervals, with the levels defined with respect to the best ask price. W to Zg denote passive orders (i.e. priced 
above the reference price) and Zg to Z _2 denote aggressive or direct orders (priced at or below the reference price, 
for immediate execution if the reference price had remained constant). The data set considered here is the daily 
LOB activity for stock BNP Paribas on 17/01/2012. 
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Figure 3: One-minute log returns for stock BNP Paribas on a typical day. 
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Figure 4: Top row subplots: Total sell volume resting in the LOB in the hrst 5 ticks away from the best bid 
(left), total buy volume resting in the first 5 ticks away from the best ask (right) for stock GDF Suez on a typical 
day. Middle Row Subplots: First differences of hgures above. Bottom Row Subplots: Sample ACF and PACF. 
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Figure 5: For stock BNP Paribas, the intensity of the volume process on either side, where the shading of each 
bin indicates the average number of shares available at those prices in that period. The plot on the right shows 
the evolution of the spread throughout the trading day. 
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Figure 6: Objective function values for the parameter vectors produced by the multi-objective II method. These 
are grouped by non-domination rank, with a rank of I indicating non-dominated vectors, a rank of 2 indicating 
vectors dominated only by a single other vector and so on. Note that the points in each group form a Pareto 
front, a feature of the optimisation. 
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Figure 7: Simulations using 2 of the non-dominated parameter vectors resulting from estimating the basic model 
with NSGA-II. The figures on the left are heatmaps of the asset mid price over 100 simulations, while the figures 
on the right represent the state of the LOB over a single simulation. 


37 


























































































































count count 










oO.IO- 

03 

O 


Objective function 1 


Figure 9: Objective function values for the parameter vectors produced by the multi-objective II method, in the 
case where we assume that order sizes are follow a mixture of Gamma distributions. 
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Figure 10: Simulations using 2 of the non-dominated parameter vectors resulting from estimating the basic model 
with NSGA-II, but assume that order sizes are drawn from a mixture of gamma distributions. 
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Figure 11: Objective function values for the parameter vectors produced by the multi-objective II method, in 
the case where we relax the assumption that the elements of the skewness vector in the Multivariate Skew-t 
distribution are equal. 
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Figure 12: Simulations using 2 of the non-dominated parameter vectors resulting from estimating the basic model 
with NSGA-II, but relaxing the assumption that the elements of the skewness vector in the Multivariate Skew-t 
distribution are equal. 
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Figure 13: Simulations of the basic model, with the addition of a ‘quote-to-trade ratio’ regulatory intervention. 
The mid-price process and daily LOB volumes with a quote-to-trade ratio oi q = ^ (top), ^ and ^ (bottom). 
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Figure 14: The proportion of solutions on the Pareto front for which the coefficients of the auxiliary model fit to 
the real data lie within the 95% confidence interval of the coefficients of the auxiliary model fit to the simulated 
data, for each trading day between 01/02/2012 and 21/02/2012 for 5 different stocks. (Left): BNP Paribas. 
(Right): Credit Agricole. 
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Figure 15: The proportion of solutions on the Pareto front for which the coefficients of the auxiliary model fit to 
the real data lie within the 95% confidence interval of the coefficients of the auxiliary model fit to the simulated 
data, for each trading day between 01/02/2012 and 21/02/2012 for 5 different stocks. (Left) Total SA. (Right) 
Technip SA. (Bottom): Sanofi. 
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